% A LaTeX file that describes technical details in ALE, including protocols and compiling.

\documentclass[12pt]{article}

\usepackage{hyperref}
\usepackage{fullpage}

\title{Arcade Learning Environment\\ Technical Manual (v.0.5.1)}
\author{Marlos C. Machado, Matthew Hausknecht, Marc G. Bellemare}

\begin{document}

\maketitle

\clearpage

\tableofcontents

\clearpage

\section{Overview}

This document is roughly divided into three parts. The first part describes how to install the Arcade Learning Environment (ALE). The second part describes the various ALE interfaces currently available: 
\begin{enumerate}
  \item \textbf{Shared Library interface} (C++ only): Loads ALE as a shared library (Section 
  \ref{sec:shared_library_interface}).
  \item \textbf{CTypes interface} (Python only): A fast Python interface to ALE, provided as a Python package (Section \ref{sec:python_interface}).
  \item \textbf{FIFO interface} (all languages): Communicates with ALE through a text interface (Section \ref{sec:pipes_interface}).
  \item \textbf{RL-Glue interface} (C/C++, Java, Python, Matlab, Lisp, Go): Communicates with ALE via RL-Glue (Section \ref{sec:rlglue_interface}).
\end{enumerate}
The final part of this document discusses the different features of ALE, including the action stochasticity parameter and video recording.

ALE also provides an example Java agent, not discussed in this manual, which uses the FIFO interface. This agent includes code for human input as well as a simple SARSA implementation. Details on the Java agent, including installation instructions, may be found under \verb+doc/java-agent+.

\section{Installing}\label{sec:install}

\subsection{Requirements}

The basic requirements to build and run ALE are:

\begin{itemize}
  \item \verb+CMake or make+
  \item A C++ compiler
\end{itemize}

Two custom makefiles are provided, \verb+makefile.mac+ (Mac OS X) and \verb+makefile.unix+ (Linux). However, beginning with version 0.5.0, we highly recommend the use of CMake rather than the custom makefiles.

Additionally, the following options are not activated by default and require additional packages:
\begin{itemize}
  \item SDL display and audio support
  \item RL-Glue support
\end{itemize}

\subsection{Installation/Compilation}\label{subsec:installation_compilation}

One first has to install ALE requirements before compiling ALE itself. 
We assume that a suitable C++ compiler (one which includes \verb+make+) is already available to 
the user. Installing CMake (and if desired, SDL) is straightforward, and can be done through 
package managers: 

\subsubsection*{Mac OS X (using Homebrew\footnote{http://brew.sh}):}
\begin{verbatim}
  > brew install cmake
  > brew install sdl
\end{verbatim}

\subsubsection*{Linux (using apt, e.g. Ubuntu):}
\begin{verbatim}
  > sudo apt-get install cmake
  > sudo apt-get install libsdl1.2-dev
\end{verbatim}

For instructions on installing RL-Glue, see Section \ref{sec:rlglue_interface}.

We are going to assume ALE was extracted to \verb+ale_0_5+. Then, compiling it
with CMake is very simple (on both systems):
\begin{verbatim}
  > cd ale_0_5
  ale_0_5> cmake -DUSE_SDL=ON -DUSE_RLGLUE=OFF -DBUILD_EXAMPLES=ON .
  ale_0_5> make -j 4
\end{verbatim}

This compiles the code with SDL but not RL-Glue, and builds the example C++ agents described
in this document. The first two options are disabled by default, while the third is
enabled; default options may be omitted from the command line. Note that SDL and CMake sometimes 
don't play well together; please refer to Troubleshooting 
(Section~\ref{sec:troubleshooting}) if compilation fails, e.g. because of a missing \verb+SDL.h+.

\section{C++ Agent: Shared Library Interface} \label{sec:shared_library_interface}

The shared library interface is the simplest way to implement a C++ agent for ALE.
This interface allows agents to directly access ALE via a class called
\verb+ALEInterface+, defined in \verb+ale_interface.hpp+. Example code detailing a simple random agent is provided under \verb+doc/examples/sharedLibraryInterfaceExample.cpp+.

To implement an agent, the first step is to include the library \verb+ale_interface.hpp+, either via 
the relative path \verb+#include "path/from/your/code/ale_interface.hpp"+, or as a standard 
header: \verb+#include <ale_interface.hpp>+. If the later is chosen, remember to add the proper 
path using the flag \verb+-I+ when compiling the code.

To instantiate the Arcade Learning Environment it is enough to write:\\

\verb+ALEInterface ale;+\\

Once the environment is initialized, it is now possible to set its arguments. This is done with the 
functions \verb+setBool(), setInt(), setFloat()+. The complete list of flags is available in 
Section~\ref{sec:arguments}. Just as an example, to set the environment's seed we write:\\

\verb+ale.setInt("random_seed", 123);+\\

Finally, after setting the desired environment parameters we now load the game ROM by providing its filename to the \verb+loadROM+ method:\\

\verb+ale.loadROM("asterix.bin");+\\

There are two different action sets provided by ALE: the ``legal'' set and the ``minimal'' 
set. Save for a few rare exceptions, the legal action set consists of all 18 actions for all games, including duplicates and actions with no effect. On the other hand, the minimal action set for a game contains only 
the actions that have some effect on that game. The \verb+getLegalActionSet+ and \verb+getMinimalActionSet+ methods provide the desired action sets:\\ 

\verb+ActionVect legal_actions = ale.getLegalActionSet();+\\

Taking an action is done by calling the function \verb+act()+ passing an object of \verb+Action+ as a parameter:\\

\verb+Action a = legal_actions[rand() % legal_actions.size()];+\\
\indent \verb+float reward = ale.act(a);+\\

Finally, one can check whether the episode has terminated using the function \verb+ale.game_over()+. With these 
functions one can already implement a very simple agent that plays randomly for one episode:

\begin{verbatim}
#include <ale_interface.hpp>

using namespace std;

int main(int argc, char** argv) {
    if (argc < 2) {
        std::cerr << "Usage: " << argv[0] << " rom_file" << std::endl;
        return 1;
    }

    ALEInterface ale;
    ale.setInt("random_seed", 123);
    ale.loadROM(argv[1]);

    ActionVect legal_actions = ale.getLegalActionSet();
    
    float totalReward = 0.0;
    while (!ale.game_over()) {
        Action a = legal_actions[rand() % legal_actions.size()];    
        float reward = ale.act(a);
            totalReward += reward;
        }
        std::cout << "The episode ended with score: " << totalReward 
            << std::endl;
    }
    return 0;
}
\end{verbatim}

Compiling with the shared library is done by appending the \verb+-lale+ flag. 
See the 

\begin{center} \verb+doc/examples/sharedLibraryExample.cpp+\end{center}

\noindent example for more details, including compilation, as
well as Section \ref{sec:troubleshooting} if errors arise.
A complete list of functions available in the class \verb+ALEInterface+ is given in Section~\ref{sec:functions}.

\section{Python Agent: CTypes Interface}\label{sec:python_interface}

To use the Python interface it is necessary to install it after ALE was compiled. With root/superuser access:

\verb+pip install .+\\

\noindent without root/superuser access:

\verb+pip install --user .+\\

This will install the package \verb+ale_python_interface+ which can be imported as usual. Example code is available under 
\begin{center} \verb+doc/examples/python_example.py+. \end{center}

Aside from a few minor differences, the Python interface mirrors the C++ interface. For example, the following implements a random agent: 

\begin{verbatim}
import sys
from random import randrange
from ale_python_interface import ALEInterface

if len(sys.argv) < 2:
  print 'Usage:', sys.argv[0], 'rom_file'
  sys.exit()

ale = ALEInterface()
ale.setInt('random_seed', 123)
ale.loadROM(sys.argv[1])

# Get the list of legal actions
legal_actions = ale.getLegalActionSet()

total_reward = 0
while not ale.game_over():
  a = legal_actions[randrange(len(legal_actions))]
  reward = ale.act(a);
  total_reward += reward
print 'Episode ended with score:', total_reward
\end{verbatim}

\section{FIFO Interface}\label{sec:pipes_interface}

The FIFO interface is text-based and allows the possibility of run-length encoding the screen. This section documents the actual protocol used; sample code implementing this protocol in Java is also included in this release.

After preliminary handshaking, the FIFO interface enters a loop in which ALE sends information about the current time step and the agent responds with both players' actions (in general agents will only control the first player). The loop is exited when one of a number of termination conditions occurs.

\subsection{Handshaking}

ALE first sends the width and height of its screen matrix as a hyphen-separated string:

\begin{verbatim}
www-hhh\n
\end{verbatim}

\noindent where \verb+www+ and \verb+hhh+ are both integers. The agent then responds with a comma-separated string:

\begin{verbatim}
s,r,k,R\n
\end{verbatim}

\noindent where \verb+s+, \verb+r+, \verb+R+ are 1 or 0 to indicate that ALE should or should not send, at every time step, screen, RAM and episode-related information (see below for details). The third argument, \verb+k+, is deprecated and currently ignored.

\subsection{Main Loop -- ALE}

After handshaking, ALE will then loop until one of the termination conditions occurs; these conditions are described below in Section \ref{subsec:termination_conditions}. If terminating, ALE sends

\begin{verbatim}
DIE\n
\end{verbatim}

Otherwise, ALE sends

\begin{verbatim}
<RAM_string><screen_string><episode_string>\n
\end{verbatim}

Where each of the three strings is either the empty string (if the agent did not request this
particular piece of information), or the relevant data terminated by a colon.

\subsubsection{\texttt{RAM\_string}}

The RAM string is 128 2-digit hexadecimal numbers, with the $i^{th}$ pair denoting the
$i^{th}$ byte of RAM; in total this string is 256 characters long, not including the terminating
`:'.

\subsubsection{\texttt{screen\_string}}

In ``full'' mode, the screen string is \texttt{www} $\times$ \texttt{hhh} 2-digit hexadecimal numbers, each representing a pixel. Pixels are sent row by row, with \texttt{www} characters for each row. In total this string is 2 $\times$ \texttt{www} $\times$ \texttt{hhh} characters long.

In run-length encoding mode, the screen string consists of a variable number of (colour, length) pairs denoting a run-length encoding of the screen, also row by row. Both colour and length are described using 2-digit hexadecimal numbers. Each pair indicates that the next `length' pixels take on the given colour; run length is thus limited to 255. Runs may wrap around onto the next row. The encoding terminates when the last pixel (i.e. the bottom-right pixel) is encoded. The length of this string is 4 characters per (colour, length) pair, and varies depending on the screen.

In either case, the screen string is terminated by a colon.

\subsubsection{\texttt{episode\_string}}

The episode string contains two comma-separated integers indicating episode termination (1 for
termination, 0 otherwise) and the most recent reward. It is also colon-terminated.

\subsubsection{Example}

Assuming that the agent requested screen, RAM and episode-related information, a string sent by ALE might look like:

\begin{verbatim}
000100...A401B2:3C3C3C3C00003C3C3C...4F4F0000:0,1:\n
^ 2x128 characters   ^ 2x160x210 characters    ^ongoing episode, reward of 1
\end{verbatim}

\subsection{Main Loop -- Agent}

After receiving a string from ALE, the agent should now send the actions of player A and player B.
These are sent as a pair of comma-separated integers on a single line, e.g.:

\begin{verbatim}
2,18\n
\end{verbatim}

\noindent where the first integer is player A's action (here, \textsc{fire}) and the second integer, player B's action (here, \textsc{noop}). Emulator control (reset, save/load state) is also handled by sending a special action value as player A's action. See Section \ref{sec:available_actions} for the list of available actions.

\subsection{Termination}\label{subsec:termination_conditions}

ALE will terminate (and potentially send a \verb+DIE+ message to the agent) whe one of the following conditions occur:

\begin{itemize}
  \item{\texttt{stdin} is closed, indicating that the agent is no longer sending data, or}
  \item{the maximum number of frames (user-specified, with no maximum by default) has been reached.}
\end{itemize}

ALE will send an end-of-episode signal when one the following is true:

\begin{itemize}
  \item{The maximum number of frames for this episode (user-specified, with no maximum by default) has been reached, or}
  \item{the game has ended, usually when player A loses their last life.}
\end{itemize}

\section{RL-Glue Interface}\label{sec:rlglue_interface}

The RL-Glue interface implements the RL-Glue 3.0 protocol.
It requires the user to first install the RL-Glue core. Additionally, the example agent and 
environment require the RL-Glue C/C++ codec. Both of these can be found on the RL-Glue web
site\footnote{http://glue.rl-community.org}.

In order to use the RL-Glue interface, ALE must be compiled with RL-Glue support. This is achieved
by invoking CMake with \verb+-DUSE_RLGLUE=ON+ (see Section \ref{sec:install}) or, if using
custom makefiles, setting \verb+USE_RLGLUE := 1+. 

Specifying the command-line argument \verb+-game_controller rlglue+ is sufficient to start ALE in 
RL-Glue mode. It will then communicate with the RL-Glue core like a regular RL-Glue environment.

\subsection{Sample Agent and Experiment}

Recall that RL-Glue is a combination of four processes: a core, an experiment, an agent, and
an environment. The core is provided by the RL-Glue library itself, and ALE is the environment
here. An example experiment and agent are provided in 

\begin{verbatim}
doc/examples/RLGlueExperiment.c
doc/examples/RLGlueAgent.c
\end{verbatim}

Assuming you installed ALE under \verb+ale_0_5+, the RL-Glue agent and experiment
can be compiled with the following command: 

\begin{verbatim}
ale_0_5/doc/examples> make -f Makefile.rlglue 
\end{verbatim}

You will then need to start the following processes to run the sample RL-Glue agent in ALE:

\begin{verbatim}
ale_0_5> rl_glue
ale_0_5> doc/examples/RLGlueAgent
ale_0_5> doc/examples/RLGlueExperiment
ale_0_5> ./ale -game_controller rlglue
\end{verbatim}

Additional options (such as those described below) can also be passed as command-line arguments.
Please refer to the RL-Glue documentation for more details. 


\subsection{Actions and Observations}

The action space consists of both Player A and Player B's actions (see Section 
\ref{sec:available_actions}
for details). In general, Player B's action may safely be set to noop (18) but it should be left out
altogether if the \verb+restricted_action_set+ option is set to true. 

The observation space depends on whether the \verb+send_rgb+ option is enabled. 
 When enabled, the observation space consists of 100,928 
 integers: first the 128 bytes of RAM (taking values in 0--255), followed by 100,800 bytes 
 describing the screen.
 Each pixel is described by three bytes, taking values in 0--255, specifying the pixel's 
 red, green and blue components in that order. The screen is provided in row-order, 
beginning with the 160 pixels that compose the first row.
 
If \verb+send_rgb+ is disabled (this is the default), the observation space consists of 33,728 integers: first the 128 bytes of RAM, then the 33,600 screen pixels (in NTSC format; values in 0--127).
These pixels are also provided in row order.

\section{Environment Specifications}\label{sec:environment_specifications}

This section provides additional information regarding the environment implemented in ALE.

\subsection{Available Actions}\label{sec:available_actions}

The following regular actions are defined in \verb+common/Constants.h+ and interpreted by ALE:

\begin{center}
\small{
\begin{tabular}{|r|r|r|r|r|}
\hline
noop (0) & fire (1) & up (2) & right (3) & left (4) \\
\hline
down (5) & up-right (6) & up-left (7) & down-right (8) & down-left (9) \\
\hline
up-fire (10) & right-fire (11) & left-fire (12) & down-fire (13) & up-right-fire (14) \\
\hline
up-left-fire (15) & down-right-fire (16) & down-left-fire (17) & reset* (40) & \\
\hline
\end{tabular}
}
\end{center}

Note that the \verb+reset+ (40) action toggles the Atari 2600 switch, rather than reset the 
environment, and as such is ignored by most interfaces. The shared library, CTypes, and fifo
interfaces provide methods for properly resetting the environment. 

Player B's actions are defined to be 18 + the equivalent action value for Player A. For example, Player B's up action is \verb+up+ (20). 

In addition to the regular ALE actions, the following (somewhat deprecated) actions are also processed by the 
FIFO interfaces:

\begin{center}
\begin{tabular}{|r|r|r|r|r|}
\hline
save-state (43) & load-state (44) & system-reset (45) \\
\hline
\end{tabular}
\end{center}

\subsection{Terminal States}

Once the end of episode is reached (a terminal state in RL terminology), no further emulation 
takes place until the appropriate reset command is sent. This command is distinct from the Atari 
2600 reset. This ``system reset'' avoids odd situations where the player can reset the game
through button presses, or where the game normally resets itself after a number of frames. This 
makes for a cleaner environment interface. With the exception of the RL-Glue interface, which 
automatically resets the environment, the interfaces described here all provide a system reset
command or method.

\subsection{Saving and Loading} 

State saving and loading operates in a stack-based manner: each call to save stores the current
environment state onto a stack, and each call to load restores the last saved copy and removes
it from the stack. The ALE 0.2 save/load mechanism, provided for backward compatibility, instead
overwrites its saved copy when a save is requested. When loading a state, the currently saved copy
is preserved. 

This functionality is provided in the fifo, shared library and CTypes interfaces. The shared
library interface additionally provides state cloning/restoring capabilities.

\subsection{Colour Averaging}

Many Atari 2600 games display objects on alternating frames (sometimes even less frequently).
This can be an issue for agents that do not consider the whole screen history. By default, 
\emph{colour averaging} is not enabled. If enabled, the environment output (as observed by agents)
is a weighted blend of the last two frames. This behaviour can be turned on using the
command--line argument \verb+-color_averaging+ (or the \verb+setBool+ function).

\subsection{Action Repeat Stochasticity}

Beginning with ALE 0.5.0, there is now an option (enabled by default) to add 
\emph{action repeat stochasticity} to the environment. With probability $p$ (default: $p = 0.25$),
the previously executed action is executed again during the next frame, ignoring the agent's
actual choice. This value can be modified using the option \verb+action_repeat_probability+.
The default value was chosen as the highest value for which human play-testers
were unable to detect any delay or control lag.

The motivation for introducing action repeat stochasticity was to help separate \emph{trajectory
optimization} research from \emph{robust controller optimization}, the latter often being the 
desired outcome in reinforcement learning (RL). We strongly encourage RL researchers to use 
the default stochasticity level in their agents, and clearly report the setting used. More 
details on the effects of action repeat stochasticity will be made available in future 
publications.

Note that, beginning with ALE 0.5.0, we have also removed the use of \verb+rand()/srand()+ from
ALE. Agents' own randomization should therefore not affect the action repeat stochasticity.

\subsection{Minimal Action Set}

It may sometimes be convenient to restrict the agent to a smaller action set. This can be
accomplished by querying the \verb+RomSettings+ class using the method 
\verb+getMinimalActionSet+. This then returns a set of actions judged ``minimal'' to play a given
game. Due to the potentially high impact of this setting on performance, we encourage researchers
to clearly report the method used in their experiments. 


\section{Miscellaneous}

This section provides additional relevant ALE information.

\subsection{Audio and Visual (Screen) Output}\label{subsec:displaying_screen}

ALE offers screen display and audio capabilities via the Simple DirectMedia Layer (SDL).
Instructions on how to install the SDL library, as well as enabling SDL support within ALE, are provided in Section \ref{subsec:installation_compilation}. 
Screen display can be enabled using the \verb+display_screen+ option (default: \verb+false+), 
and sound playback using the \verb+sound+ option (default: \verb+false+).
SDL support has been tested under Linux and Mac OS X. 

\subsection{Recording Movies}

ALE now provides support for recording frames; if sound is enabled (Section \ref{subsec:displaying_screen}), it is also possible to record audio output.
An example C++ program is provided which will record both visual and audio output for a single episode of play. This program is located at

\begin{center}
\texttt{doc/examples/videoRecordingExample.cpp}
\end{center}

Compiling and running this program will create a directory \texttt{record}\footnote{The example program creates this directory, using a system call to \texttt{mkdir}. If this fails on your machine, you will need to manually create this directory.} in which frames will be saved sequentially and named according to their frame numbers. Thus, if the episode lasts 683 frames then the files \verb+record/000000.png+ to \verb+record/000682.png+ are created. Furthermore, sound output is also recorded as \verb+record/sound.wav+. The following options control recording behaviour:

\small{
\begin{verbatim}
  sound <true|false> -- whether to enable sound output (set to true for recording)
    default: false
  record_screen_dir -- path to record screens; if empty, no recording occurs
    default: ""
  record_sound_filename -- path to single wav file to be recorded; 
            if empty, no recording occurs
    default: ""
\end{verbatim}
}

Once frames and/or sound have been recorded, they may be joined into a movie file using the external program \texttt{ffmpeg} (installable on Mac OS X and most *nix systems through a package manager). For your convenience, two example scripts are provided:

\begin{itemize}
    \item{\texttt{doc/scripts/videoRecordingExampleJoinMacOSX.sh}}
    \item{\texttt{doc/scripts/videoRecordingExampleJoinUnix.sh}}
\end{itemize}

These should be run from the same directory that the C++ example was run from. Unfortunately \texttt{ffmpeg} is a complicated beast and taming it may require tweaking specific to your system configuration. Please contact us if you would like to provide an example script for a different configuration.

Here is a full, step-by-step example on Mac OS X (after building the project):
\begin{verbatim}
> cd ale_0_5
ale_0_5> doc/examples/videoRecordingExample space_invaders.bin

A.L.E: Arcade Learning Environment (version 0.5.0)
[Powered by Stella]

[ usual ALE output ]

Recording screens to directory: record

Recording complete. See manual for instructions on creating a video.

ale_0_5> doc/scripts/videoRecordingExampleJoinMacOSX.sh

ffmpeg version 2.6.1 Copyright (c) 2000-2015 the FFmpeg developers
  built with Apple clang version 4.1 (tags/Apple/clang-421.11.65) ... 

[ loads of output ]

ale_0_5> open agent.mov
\end{verbatim}

\section{Troubleshooting / FAQ} \label{sec:troubleshooting}

More questions and answers may be found on the ALE-users mailing list:
\begin{center}
\url{https://groups.google.com/forum/#!forum/arcade-learning-environment}
\end{center}

\begin{itemize}
\item I downloaded ALE and I installed it successfully but I cannot find any ROM file at \verb+roms/+.
Do I have to get them somewhere else?

Yes. We do not distribute Atari 2600 ROMs. 

\item The C++ examples compile, but when run give the following error: 
\begin{center}
\verb+dyld: Library not loaded: libale.so+ .
\end{center} 
You may need to add ALE to your library path:

\begin{verbatim}
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/ale_0_5
\end{verbatim}

or, under Mac OS X,

\begin{verbatim}
export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/path/to/ale_0_5
\end{verbatim}

Alternatively, run the examples from the directory in which \verb+libale.so+ resides.


\item I am trying to run ALE via command line (FIFO interface, or RL-Glue agent) but I keep 
getting an ``unsupported ROM file'' error. What am I doing wrong?

These errors are generally hard to be solved in every possible way, but our experience has shown 
several people forgetting to specify the ROM file as the last argument on the command line, in lower 
case. Moreover, each game is tied to an internal parser which relies on a specific filename (e.g. \verb+pong.bin+ for \textsc{Pong}). This is specified in the header file (hpp) for the corresponding parser (e.g. \verb+src/games/supported/Pong.hpp+). You may want to check whether the ROM you are 
trying to load is supported by the current version of ALE.

\item I am having problems when compiling with the option \verb+USE_SDL=ON+ on Mac OS X.
The compiler complains it was not able to find the proper SDL files. What should I do?

We spent a considerable amount of time trying to make sure CMake would properly find the correct 
path in different situations. Something we realized is that sometimes, if the user has two 
installations (generally one via \verb+brew+ and another via \verb+dmg+ package), conflicts may 
arise. Generally it is enough to set the variable \verb+SDL_LIBRARY+ as below (the most common
path where SDL is installed is used here):
\vspace{-0.3cm}
\begin{verbatim}
SDL_LIBRARY:STRING=/usr/local/lib/libSDLmain.a;/usr/local/lib/libSDL.dylib;
-framework Cocoa
\end{verbatim}

\item I want to be able to extract from the game the number of lives my agent still has. How can I do it?

Previous versions of ALE did not support this. We started to support such feature since version 
0.5.0, through the use of the function \verb+lives()+.

\item When extracting the screen I realized that I never see a pixel with an odd value. However,
the pixel is represented as a byte. Shouldn't it be up to 255 with odd an even values?

No, the Atari 2600 console (NTSC format) only supports 128 colours. Therefore, even though the colours are represented 
in a byte, in fact only the seven most significant bits are used. Consequently you have
to right-shift the colour byte by 1 bit to produce consecutively numbered colour values.

\item When trying to run ALE I get a linking error stating \verb+-lrlagent+ could not be found.  
What should I do?

The \verb+rlragent+ is generated by the library RL-Glue. Therefore, we recommend you to verify your 
installation of RL-Glue and to make sure it is being properly found when compiling ALE code. If
RL-Glue support is not needed, consider running cmake with \verb+USE_RLGLUE=OFF+ (or disabling
it in the makefile).

\item When running/compiling my own C++ agent with \verb+display_screen+/SDL enabled, I get the following error(s):

\begin{verbatim}
Terminating app due to uncaught exception 'NSInternalInconsistencyException', 
reason: 'Error (1000) creating CGSWindow on line 259'

(and/or)

Undefined symbols for architecture x86_64:
  "_SDL_main", referenced from:
      -[SDLMain applicationDidFinishLaunching:] in libSDLmain.a(SDLMain.o)
ld: symbol(s) not found for architecture x86_64
\end{verbatim}

Make sure your code contains \verb+#include <SDL.h>+ and that \verb+__USE_SDL+ is defined. Also, compiling under Mac OS X requires the inclusion of the flag \verb+-framework Cocoa+. See the shared library example for a working example. 

\item The Python interface isn't working! It can't find \verb+libale_c.so+.

Make sure to compile ALE using CMake. This will build the library required by the Python interface. 

\end{itemize}

\section{ALE Interface Specifications}\label{sec:functions}

Below are listed the functions available in ALE interface with the description of their 
behaviour. The functions were divided in different sections to make the presentation more
clear.

  \subsection{Initialization}

  \indent \indent \verb+ALEInterface(bool display_screen)+: ALE constructor. If the 
  \verb+display_screen+ parameter is set to \verb+true+, and ALE was compiled
  with SDL, the game display will be presented. If set to \verb+false+ one will not see
  the game being played.
  
  \verb+void loadROM(string rom_file)+: Resets ALE and then loads a game. After this call
  the game should be ready to play. If one changes (or sets) a setting (Section~\ref{sec:getSet}), 
  it is necessary to call this function after the change so it can take effect.
  
  \subsection{Parameters setting and retrieval}\label{sec:getSet}
  
  \indent \indent \verb+string getString(const string& key)+: Gets the value of any flag passed
  as parameter that has a string value; \emph{e.g.}: \verb+getString(``record_sound_filename'')+.

  \verb+int getInt(const string& key)+: Gets the value of any flag passed as parameter that has 
  an integer value; \emph{e.g.}: \verb+getInt(``frame_skip'')+.

  \verb+bool getBool(const string& key)+: Gets the value of any flag passed as parameter that has 
  a boolean value; \emph{e.g.}: \verb+getBool(``restricted_action_set'')+. 
  
  \verb+float getFloat(const string& key)+: Gets the value of any flag passed as parameter that has 
  a float value; \emph{e.g.}: \verb+getBool(``repeat_action_probability'')+.   
  
  \verb+void setString(const string& key, const string& value)+: Sets the value of any flag
  that has a string type; \emph{e.g.}: \verb+setString("record_screen_dir", "record")+.
  \verb+loadRom()+ must be called before the setting will take effect.
  
  \verb+void setInt(const std::string& key, const int value)+: Sets the value of any flag
  that has an integer type; \emph{e.g.}: \verb+setInt("frame_skip", 1)+. \verb+loadRom()+
  must be called before the setting will take effect.
  
  \verb+void setBool(const std::string& key, const bool value)+: Sets the value of any flag
  that has a boolean type; \emph{e.g.}: \verb+setBool("restricted_action_set", false)+.
  \verb+loadRom()+ must be called before the setting will take effect.
  
  \verb+void setFloat(const std::string& key, const float value)+: Sets the value of any flag
  that has a float type; \emph{e.g.}: \verb+setFloat("repeat_action_probability", 0.25)+.
  \verb+loadRom()+ must be called before the setting will take effect.
  
  \subsection{Acting and Perceiving}
  
  \indent \indent \verb+reward_t act(Action action)+: Applies an action to the game and returns the
  reward. It is the user's responsibility to check if the game has ended and reset when necessary
  (this method will keep pressing buttons on the game over screen).
  
  \verb+bool game_over()+: Indicates if the game has ended.
  
  \verb+void reset_game()+: Resets the game, but not the full system (it is not ``equivalent''
  to  unplug the console from electricity).
  
  \verb+ActionVect getLegalActionSet()+: Returns the vector of legal actions (all the 18 actions).
  This should be called only after the ROM is loaded.
  
  \verb+ActionVect getMinimalActionSet()+: Returns the vector of the minimal set of actions
  needed to play the game (all actions that have some effect on the game). This should be
  called only after the ROM is loaded.
  
  \verb+int getFrameNumber()+: Returns the current frame number since the loading of the ROM.
  
  \verb+const int lives()+: Returns the agent's remaining number of lives. If the game does not have 
  the concept of lives (\emph{e.g.} \textsc{Freeway}), this function returns 0.
  
  \verb+int getEpisodeFrameNumber()+: Returns the current frame number since the start of the
  current episode.
  
  \verb+const ALEScreen &getScreen()+: Returns a matrix containing the current game screen.
  
  \verb+void getScreenGrayscale(pixel_t *grayscale_output_buffer)+: This method should receive an
  array of length width $\times$ height (generally $160 \times 210 = 33,600$) and then it will fill this array
  with the grayscale colours
  
  \verb+void getScreenRGB(pixel_t *output_rgb_buffer)+: This method should receive an array of length 
  $3 \times$ width $\times$ height (generally $3 \times 160 \times 210 = 100,800$) and then it will fill this
  array with the RGB colours. The first positions contain the red colours, followed by the green colours
  and then the blue colours
 
  \verb+const ALERAM &getRAM()+: Returns a vector containing current RAM content (byte-level).
  
  \verb+void saveState()+: Saves the current state of the system if one wants to be able to recover 
  a state in the future; \emph{e.g.} in search algorithms.
  
  \verb+void loadState()+: Loads a previous saved state of the system once we have a state saved.

  \verb+ALEState cloneState()+: Makes a copy of the environment state. This copy does \textbf{not}
  include pseudo-randomness, making it suitable for planning purposes.

  \verb+ALEState cloneSystemState()+:  This makes a copy of the system and environment state,
  suitable for serialization. This includes pseudo-randomness and so is \textbf{not} suitable for planning
  purposes.

  \verb+void restoreState(const ALEState& state)+:  Reverse operation of \verb+cloneState()+. This does not
  restore pseudo-randomness, so that repeated calls to \verb+restoreState()+ in the stochastic controls setting
  will not lead to the same outcomes. By contrast, see \verb+restoreSystemState+.

  \verb+void restoreSystemState(const ALEState& state)+: Reverse operation of \verb+cloneSystemState+.
  \subsection{Recording trajectories}
   
  \indent \indent \verb+void saveScreenPNG(const string& filename)+: Saves the current screen as
  a \verb+png+ file.
  
  \verb+ScreenExporter *createScreenExporter(const string &path) const+: Creates a 
  ScreenExporter object which can be used to save a sequence of frames. Frames are saved 
  in the directory 'path', which needs to exists. This is used to generate movies depicting the behavior
  of agents.
  
\section{Command-line Arguments}\label{sec:arguments}

Command-line arguments are passed to ALE before the ROM filename. These are converted into
options (minus the prefix hyphen ('\verb+-+') within ALE. Currently, 
setting options from the command-line is only relevant to the fifo and RL-Glue interfaces.
When using the C++ or Python interface, one should instead directly invoke the relevant setter
functions (e.g. \verb+setInt()+; see Section~\ref{sec:getSet}). The configuration file 
\verb+ale_0_5/stellarc+ can also be used to set frequently used options. 

\subsection{Main Arguments}
\small{
\begin{verbatim}

  -help -- prints out help information

  -game_controller <fifo|fifo_named|rlglue> -- selects an ALE interface
    default: unset

  -random_seed <###> -- picks the ALE random seed; if set to 0, sets to current 
    time instead 
    default: 0 

  -display_screen <true|false> -- if true and SDL is enabled, displays ALE screen
    default: false
    
  -sound  <true|false> -- if true and SDL is enabled, the game will have game
    sounds enabled
    default: false


\end{verbatim}
}

\subsection{Environment Arguments}

\small{
\begin{verbatim}

  -max_num_frames ### -- max. total number of frames, or 0 for no maximum 
    (it is not available in the shared library interface, i.e. to be set 
    by C++  or Python code directly linking to the shared library)
    default: 0

  -max_num_frames_per_episode ### -- max. number of frames per episode
    default: 0

  -frame_skip ### -- frame skipping rate; 1 indicates no frame skip 
    default: 1

  -color_averaging <true|false> -- if true, enables colour averaging 
    default: false

  -record_screen_dir [save_directory] -- saves game screen images to
    save_directory
     
  -repeat_action_probability -- stochasticity in the environment. It is the
    probability the previous action will repeated without executing the new
    one
    default: 0.25
\end{verbatim}
}

\subsection{FIFO Interface Arguments}

\small{
\begin{verbatim}
  -run_length_encoding <true|false> -- if true, encodes data using run-length
    encoding
    default: true
\end{verbatim}
}

\subsection{RL-Glue Interface Arguments}

\small{
\begin{verbatim}
  -send_rgb <true|false> -- if true, RGB values are sent for each pixel
    instead of the pallette index values
    default: false
    
  -restricted_action_set <true|false> -- if true, agents use a smaller set of 
    actions (RL-Glue interfaces only)
    default: false    
\end{verbatim}
}

\end{document}
